Information retrieval strategies for accessing african audio corpora
نویسندگان
چکیده
In this paper we present a first approach to access African oral corpora, combining automatic speech recognition and information retrieval. Firstly, we present the principal characteristics of our Somali speech recognizer [8] and the results obtained on real audio archives gathered from Djibouti Radio. Secondly, we present a Hybrid Language Model (HLM) including words and sub-words to improve the robustness against OOV words. We proceed to Information Retrieval experiments with various strategies. We search on the different outputs of the ASR system (words, sub-words and hybrid). We finally present a new strategy combining sub-words and words to enhance the information retrieval results.
منابع مشابه
دیداری کردن نتایج جستوجو در فرایند بازیابی اطلاعات
Purpose: One of the most effective ways to achieve optimum information retrieval is through visualization of Information. Search strategies, probing skills, querying of information needs and analysis of information play a significant role in the accessing of necessary and useful information. Besides the factors mentioned above, information visualization can increase the availability level of in...
متن کاملBikers Accessing the Web: The SmartWeb Motorbike Corpus
Three advanced German speech corpora have been collected during the German SmartWeb project. One of them, the SmartWeb Motorbike Corpus (SMC) is described in this paper. As with all SmartWeb speech corpora (e.g. (M ögele et al., 2006)) SMC is designed for a dialogue system dealing with open domains. The corpus is recorded under the special circumstances of a motorbike ride and contains utteranc...
متن کاملSpeech Recognition and Information Retrieval: Experiments in Retrieving Spoken Documents
The Informedia Digital Video Library Project at Carnegie Mellon University is making large corpora of video and audio data available for full content retrieval by integrating natural language understanding, image processing, speech recognition and information retrieval. Information retrieval of from corpora of speech recognition output is critical to the project’s success. In this paper, we out...
متن کاملOn the applicability of advanced information retrieval techniques for a database of African music
Music information retrieval from an archive of audio recordings of African music poses many challenges for research. Apart from audio feature extraction and audio content description, there is a need for techniques that allow flexible database querying. These techniques have to be developed in relation to a careful cultural, social and economical analysis of the user profile and the general con...
متن کاملModular System Design for Multimedial Information Handling
Often, information retrieval from various other media is analogous to text-based retrieval; however , accessing documents in e.g. audio or video formats causes some extra problems, in particular with respect to document segmentation, choice of indexing features, and robustness. We review these diiculties, together with some previous attempts to overcome them, and then describe a very exible, mo...
متن کامل